Compaction Management in Distributed Key-Value Datastores

نویسندگان

Muhammad Yousuf Ahmad

Bettina Kemme

چکیده

Compactions are a vital maintenance mechanism used by datastores based on the log-structured merge-tree to counter the continuous buildup of data files under update-intensive workloads. While compactions help keep read latencies in check over the long run, this comes at the cost of significantly degraded read performance over the course of the compaction itself. In this paper, we offer an in-depth analysis of compaction-related performance overheads and propose techniques for their mitigation. We offload large, expensive compactions to a dedicated compaction server to allow the datastore server to better utilize its resources towards serving the actual workload. Moreover, since the newly compacted data is already cached in the compaction server’s main memory, we fetch this data over the network directly into the datastore server’s local cache, thereby avoiding the performance penalty of reading it back from the filesystem. In fact, pre-fetching the compacted data from the remote cache prior to switching the workload over to it can eliminate local cache misses altogether. Therefore, we implement a smarter warmup algorithm that ensures that all incoming read requests are served from the datastore server’s local cache even as it is warming up. We have integrated our solution into HBase, and using the YCSB and TPC-C benchmarks, we show that our approach significantly mitigates compaction-related performance problems. We also demonstrate the scalability of our solution by distributing compactions across multiple compaction servers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cloud Platform Datastore Support UCSB Technical Report # 2011 - 08

1 Recent technological advances in hardware and software have facilitated the explosive growth in the production of digital information. Cloud systems offer tremendous scale, resource availability, and ease of use, with which we can process this data in the pursuit of scientific, financial, social, and technological advances. However, there are many systems to choose from that differ in many wa...

متن کامل

Distributed Datastores: Towards Probabilistic Approach for Estimation of Dependability

This paper focuses on the contradiction that follows from Brewer’s Conjecture for distributed datastores: the need to deliver qualitative data to the user requires a guarantee of consistency, availability and stability of the system at the same time, but Brewer’s Conjecture claims that this is impossible. To overcome this contradiction in the paper it is suggested to estimate statistically viol...

متن کامل

Key-Value Datastores Comparison in AppScale

We present a simple framework that employs a single API – the Datastore API from the Google App Engine cloud computing platform – to interface to different open source distributed database technologies in use today. We use the framework to “plug in” different databases to the API so that they can be used by web applications and services without modification. The system facilitates empirical eva...

متن کامل

To Vote Before Decide: A Logless One-Phase Commit Protocol for Highly-Available Datastores

Highly-available datastores are widely deployed for online applications. However, many online applications are not contented with the simple data access interface currently provided by highly-available datastores. Distributed transaction support is demanded by applications such as largescale online payment used by Alipay or Paypal. Current solutions to distributed transaction can spend more tha...

متن کامل

LoadIQ : Learning to Identify Workload Phases from a Live Storage

Eno Thereska began by relating some common abstractions used in today’s datastores. For example, key-value stores, file stores, and graph stores are all used to store data, but the different abstractions are often paired with certain assumptions and designs. When workloads change, we end up needing to redesign systems for a new set of assumptions. In his talk, Eno proposes the radical idea of u...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

PVLDB

دوره 8 شماره

صفحات -

تاریخ انتشار 2015

Compaction Management in Distributed Key-Value Datastores

نویسندگان

چکیده

منابع مشابه

Cloud Platform Datastore Support UCSB Technical Report # 2011 - 08

Distributed Datastores: Towards Probabilistic Approach for Estimation of Dependability

Key-Value Datastores Comparison in AppScale

To Vote Before Decide: A Logless One-Phase Commit Protocol for Highly-Available Datastores

LoadIQ : Learning to Identify Workload Phases from a Live Storage

عنوان ژورنال:

اشتراک گذاری